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ABSTRACT 

The  small-sample  behavior  of  two  kernel-type  density  estimators  which 
have  been  proposed  in  the  literature  for  randomly  right-censored  samples  is 
investigated  via  Monte  Carlo  simulations.  The  extensive  simulation  study  was 
performed  for  five  families  of  life  distributions,  two  different  censoring 
distributions,  three  kernel  functions,  and  several  bandwidth  sequences  and  for 
sample  sizes  from  n=20  to  n=300.  The  simulation  results  reinforce  previous 
theoretical  results  for  the  estimators  and  lead  to  conjectures  about  their 
general  behavior  asymptotically  as  well  as  for  small  samples.  A  comparison  of 
the  two  density  estimators  is  also  indicated. 
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1 .  IJTTRODUCTION 


Density  estimation  is  a  very  important  topic  in  applied,  as  well  as 
theoretical,  statistics.  In  particular,  nonparametric  procedures  for  estimating 
an  unknown  density  are  extremely  useful  in  determining  the  characteristics  of  a 
statistical  population  being  sampled  and  have  direct  applications  in  many 
inference  problems.  The  modern  methods  of  nonparametric  density  esl  fmation 
have  been  developed  since  the  early  1950' s  and  lead  to  smooth  estimates  which 
are  more  suitable  for  inference  than  simple  histogram  estimates.  Most  of  these 
estimators  were  based  on  complete  samples,  that  is,  random  samples  of  size  n 
from  the  unknown  density.  There  have  been  several  reviews  written  which  give 
extensive  bibliographies  of  results  on  nonparametric  density  estimation  from 
complete  samples.  For  example,  see  Wegman  (1972  a,b),  Fryer  (1977),  Tapia  and 
Thompson  (1978),  Wertz  and  Schneider  (1979),  and  Bean  and  Tsokos  (1980). 

Recently,  density  estimation  from  incomplete  or  censored  samples  lias 
received  a  great  deal  of  attention.  Right-censored  observations  arise  in  many 
life  testing  situations  and  are  very  common  in  survival  analysis  (Lagakos,  1979). 
Such  data  occur  often  in  medical  trials  when  patients  may  enter  treatment  at 
different  times  and  then  either  die  from  the  disease,  or  cause,  under  investiga¬ 
tion  or  leave  the  study  before  it  is  terminated  (move  away  or  die  from  another 
competing  cause).  Also,  in  industrial  life  testing,  items  may  be  removed  from 
the  study  at  various  times  for  more  extensive  analysis  or  for  other  reasons. 

For  such  situations,  it  is  of  interest  to  obtain  nonparametric  estimates  of 
the  density  function  of  the  lifetime  variable  based  on  the  right-censored  data. 
The  development  of  such  density  (or  related  function)  estimates  has  only 
recently  been  considered,  and  a  survey  of  known  results  was  given  by  Padgett 
and  McNichols  (1984).  The  developments  for  censored  data  have  followed  the 


same  basic  approaches  as  for  the  complete-sample  case  but  generally  present 
greater  mathematical  difficulties. 

Kernel  density  estimators  from  randomly  right-censored  data  have  been 
studied  by  several  authors.  A  kernel-type  density  estimator  was  proposed  by  Blum 
and  Susarla  (1980)  and  its  asymptotic  properties  were  studied.  In  particular, 
the  asymptotic  theory  of  the  maximum  deviation  of  their  estimator  was  presented, 
extending  the  results  of  Rosenblatt  (1976)  to  the  case  of  random  right-censorship. 
The  strong  consistency  properties  of  the  kernel  estimator  based  on  the  product- 
limit  estimate  of  the  distribution  function  were  studied  by  Foldes,  Rejto  and 
Winter  (1981).  McNichols  and  Padgett  (1981)  obtained  very  complicated  finite- 
sample  expressions  for  the  kernel  density  estimator  and  showed  that  it  was 
asymptotically  unbiased  and  that  its  variance  approached  zero  as  the  sample  size 
increased,  assuming  the  Koziol  and  Green  (1976)  model  of  random  censorship. 

Also,  a  modification  of  the  kernel  density  estimator  in  which  the  bandwidth 
depended  on  the  data  was  proposed  by  McNichols  and  Padgett  (1984).  However, 
only  asymptotic  properties  were  obtained  in  all  of  these  results,  except  for 
those  by  McNichols  and  Padgett  (1981)  with  respect  to  the  Koziol-Green  model 
which  is  somewhat  restrictive  in  practice. 

It  is  the  purpose  of  this  paper  to  study,  by  fairly  extensive  Monte 
Carlo  simulations,  the  finite-sample  behavior  of  kernel  density  estimators 
based  on  randomly  right-censored  data.  The  simulation  study  was  performed  since 
it  is  very  difficult,  if  not  impossible,  to  obtain  (even  approximate)  expressions 
for  the  biases,  mean-squared  errors,  variances,  and  sampling  distributions  of 
such  estimators  for  finite  sample  sizes  under  general  nonrestrictive  conditions. 
Several  different  families  of  lifetime  distributions,  various  types  of  censoring 
distributions  that  are  assumed  in  practice,  various  bandwidth  sequences,  and 
three  different  kernel  functions  were  used  in  the  simulations.  Since,  for 


3 


censored  data,  optimal  bandwidth  results  analogous  to  those  for  complete 
samples  is  not  available,  some  attention  is  given  in  the  simulations  to  the 
behavior  of  the  estimators  with  respect  to  the  bandwidth. 

Randomly  right-censored  data  and  the  product-limit  estimator  will  be 
discussed  in  Section  2.  The  kernel  density  estimators  that  are  to  be  studied 
will  be  given  in  Section  3.  The  computer  simulations  will  be  described  and 
a  representative  proportion  of  the  simulation  results  will  be  given  in 
Section  4.  Finally,  in  Section  5  some  conclusions  concerning  the  small-sample 
behavior  of  the  kernel  estimators  studied  will  be  stated  or  conjectured. 


2.  RANDOMLY  RIGHT-CENSORED  SAMPLES 

Let  X°,X2,...,X°  denote  the  true  survival  times  of  n  items  or 
individuals  which  are  censored  on  the  right  by  a  sequence  Uj ,U2, . . . ,Ur 
which  in  general  may  be  either  constants  or  random  variables.  It  is  assumed 
that  the  X°'s  are  nonnegative  independent  identically  distributed  random 
variables  with  common  unknown  distribution  function  F°.  For  the  problem  of 
density  estimation,  it  is  assumed  that  F°  is  absolutely  continuous  with 
density  f°. 

The  observed  right-censored  data  are  denoted  by  the  pairs  (X^,A^), 
i=l,...,n,  where 


X.  =  min{X°,U.},  A.  = 


1 

if 

X° 

s  u. 

i 

1 

0 

if 

X° 

>  u. . 

Thus,  it  is  known  which  observations  are  times  of  failure  or  death  and  which 
ones  are  censored  or  loss  times.  The  nature  of  the  censoring  mechanism 
depends  on  the  IL's:  (i)  If  Up...,UQ  are  fixed  constants,  the  observa¬ 
tions  are  time-truncated.  If  all  IL's  are  equal  to  the  same  constant,  then 


the  rth  order 


the  case  of  Type  I  censoring  results,  (ii)  If  all  IL  = 
statistic  of  X° , . .  . ,X° ,  then  the  situation  is  that  of  simple  Type  II  censoring, 
(iii)  If  Up...,U  constitute  a  random  sample  from  a  distribution  H  (which 
is  usually  unknown)  and  are  independent  of  X°,...,X°,  then  (X^,Ap,  i=l,2,... 

is  called  a  randomly  right-censored  sample . 

The  random  censorship  model  (iii)  is  assumed  for  the  results  presented 

here.  It  is  attractive  because  of  its  mathematical  convenience.  Assuming 

this  model,  are  independent  Bernoulli  random  variables  and  the 

distribution  function  F  of  each  X^,  i=l,...,n,  is  given  by  1-F  =  (1-F°)(1-H) 

Under  the  Koziol  and  Green  (1976)  model  of  random  censorship,  which  is  the 

proportional  hazards  assumption  of  Cox  (1972),  it  is  assumed  that  there  is  a 

o  B 

positive  constant  p  such  that  1-H  =  (1-F  )  .  Then  by  a  result  of  Chen, 

Hollander,  and  Langberg  (1982),  the  pairs  (X?,U^),  i=l . .  follow  the 

proportional  hazards  model  if  and  only  if  (X^,...,Xq)  and  (Ap  .  .  .  .A^)  are 
independent.  This  Koziol-Green  model  of  random  censorship  arises  in  several 
situations  (Efron,  1967;  Csorgo  and  Horvath,  1981;  Chen,  Hollander  and 
Langberg,  1982).  Note  that  P  is  a  censoring  coefficient  since 
a  =  P(X?  £  up  =  (1  +  P)"1,  which  is  the  probability  of  an  uncensored 
observation. 

Based  on  the  censored  sample  (XpAp,  i=l,...,n,  a  popular  estimator 
of  the  survival  probability  S°(t)  =  1-F°(t)  at  t  £  0  is  the  product  •  1  inn  t 
estimator,  proposed  by  Kaplan  and  Meier  (1958)  as  the  "nonparametric  maximum 
likelihood  estimator"  of  S°.  This  estimator  was  shown  to  be  "self-consistent" 
by  Efron  (1967). 

Let  (ZpAp,  i=l,...,n,  denote  the  ordered  X/s  along  with  their 
corresponding  A/s.  A  value  of  the  censored  sample  will  be  denoted  by  the 
corresponding  lower  case  letters  (Xp6p  or  (Zp6p)  for  the  unordered  o> 


ordered  sample,  respectively.  The  product-limit  estimator  of  S  is  defined 
by  (Efron,  1967) 


Pn(t)  = 


i,  o  ^  t  s  Zj 

.n  ^n-i+1^  ’  1  ^Zk-l’Zk^’  k=2«---»n- 

i=l 


0, 


t  >  z 


Denote  the  product-limit  estimator  of  F  (t)  by  F  (t)  =  1-P  (t),  and  let 

A,  A 

s.  denote  the  jump  of  P  (or  F  )  at  Z.,  that  is, 
j  J  e  n  n  J 


1-P„(Z2), 


j=l 


si  =  <  P„(Z.;)  -  Pn(2j+i)>  j-2,...,n-l 


n  J 

VV* 


j=n. 


Note  that  s.=0  if  and  only  if  6.’=0,  j  <  n,  that  is,  if  Z.  is  a 
J  J  J 

censored  observation. 

The  product-limit  estimator  has  played  a  central  role  in  the  analysis 
of  censored  survival  data  (Miller,  1981),  and  its  properties  have  been  studied 
extensively  by  many  authors,  for  example,  Breslow  and  Crowley  (1974),  Foldes, 
Rejto  and  Winter  (1980),  and  Wellner  (1982). 


3.  THE  KERNEL  DENSITY  ESTIMATORS 


Since  the  work  of  Rosenblatt  (1956)  and  Parzen  (1962),  kernel  density 
estimators  have  been  perhaps  the  most  popular  density  estimators  used  in 
practice  and  have  been  studied  extensively  regarding  their  theoretical 
properties.  Also,  various  modifications  with  respect  to  the  bandwidth  sequence 
and  kernel  have  been  proposed.  Until  recently,  all  of  the  results  were  for 
complete  samples  (see  Fryer,  1977,  or  Bean  and  Tsokos,  1980).  For  randomly 


right-censored  data,  the  first  results  for  kernel  density  estimate;  .  li  ?  not 
appear  until  1980. 


Blum  and  Susarla  (1980)  generalized  the  complete-sample  results  if 
Rosenblatt  (1976)  concerning  maximum  deviation  of  density  estimates  by  the 
kernel  method.  They  obtained  limit  theorems  for  the  maximum  over  a  iinite 
interval  of  a  normalized  deviation  of  the  density  estimate  when  the  observations 
were  censored  on  the  right.  The  results  were  useful  for  goodness-oi  fi'  tests 
and  tests  of  hypothesis  about  the  unknown  lifetime  density  f°.  To  (I.-I  nc 
the  Blum-Susarla  estimator  based  on  the  randomly  censored  observat j ons 
(X^,A^),  i=l,...,n,  let  (h=h(n)}  be  a  positive  sequence  converging  lu  zero 


as  n  ■*  ®  and  let 


N  (x)  =  number  of  X^'s  >  x  . 


Define 


o  (1+N*(X.)(  ‘Hj-O.Xj  Sx] 


H  (x)  =  ..  •  / 

n  j=l  \  2+N+(X.)  j 


where  denotes  the  indicator  function  of  the  measurable  set  A.  By  a 


modification  of  the  product-limit  estimator,  it  can  be  shown  that  H 


3  s  a 


good  estimate  of  the  survival  function  for  the  censoring  distribution. 


H  =  1-H  (Blum  and  Susarla,  1980).  For  a  kernel  function  K  satisfying 


certain  conditions,  the  Blum-Susarla  estimator  is  defined  by 


x-X. 


£*(*)  »  -1  •  £1 
n  nh 


1  J[«.=l] 

_ _J _ i 


if 

Hq(x) 


(3.1) 


By  following  standard  arguments,  (f  H  )Q(x)  = 


“111  ★ 

(nh)  1  K((x-Xj)/h)I and  Hq(x)  can  be  shown  to  be  good  estimator:; 


of  f°(x)H  (x)  and  H  (x),  respectively.  This  motivates  the  use  oi  (3  1) 


as  an  estimator  of  f°(x). 


The  main  results  of  Blum  and  Susarla  (1980)  concern  the  asymptotic 


distribution  of 


x-X 

|f*(x)  -  [hH*(x))"1E[K(-g-^)II6  =1]) 

M  =  (nh)^  sup  -  ^ 

n  O^xSl 


[f°(x)/H  (x) ) ^ 


under  various  conditions  on  f°,  K,  and  H. 

Foldes,  Rejto  and  Winter  (1981)  obtained  strong  convergence  results 
for  the  kernel  density  estimator 


fn(x)  = 


h_1  d^n(t) ' 


(3.2) 


which  reduces  to  the  usual  Parzen  (1962)  density  estimator  in  the  ca.-.e  of 
no  censoring  (since  the  product-limit  estimator  Fn  reduces  to  the  usual 
empirical  distribution  function).  Their  results  were  obtained  under  various 
conditions  on  H,  F°,  f°,  and  K,  and  they  assumed  that  the  bandwidth  sequence 
(h(n)}  was  such  that  h(n)  ■*  0  but  h(n)  (n/log(n))^^  -»  ®  as  n  ®. 

McNichols  and  Padgett  (1981)  wrote  (3.2)  as 


~  ,  n  x-Z. 

fn(x)  =  h'1  2  s  K(iri: 

J=1 


(3.3) 


where  Z.  is  the  jth  ordered  observation  and  s.  denotes  the  jump  of 
J  3 

A 

F  at  Z..  They  considered  the  mean,  variance,  and  mean-squared  error  of 
u  J 

(3.3)  under  the  Koziol-Green  model.  Expressions  for  the  mean  and  variance  of 
(3.3)  at  each  x  £  0  were  obtained  and  asymptotic  unbiasedness  and  mean-square 
convergence  was  shown  with  K  and  {h(n)}  satisfying  the  usual  Parzen  (1962) 
conditions.  Note  that  the  sums  in  both  (3.1)  and  (3.3)  only  explicitly  include 
the  terms  with  uncensored  observations  although  the  censoring  is  treated 
somewhat  differently. 


The  small-sample  properties  of  (3.1)  and  (3.3)  have  not  been  studied 
previously,  either  analytically  or  by  computer  simulations,  other  than  under 
the  restrictions  of  the  Koziol-Green  model  (McNichols  and  Padgett,  1981). 

In  the  next  section,  a  rather  extensive  Monte  Carlo  simulation  study  of  the 
estimators  (3.1)  and  (3.3)  for  small  sample  sizes  will  be  described,  and  some 
representative  results  will  be  presented. 

It  should  be  mentioned  that  a  modification  of  f  in  which  the  bandwidth 

n 

h  is  data-driven  has  been  given  by  McNichols  and  Padgett  (1984).  It  was 
shown  that  if  h  =  h(Xj,...,XR)  is  a  "nearest  neighbor"  type  function,  then 
the  conditions  for  consistency  of  the  modified  estimator  hold.  Also,  it  should 
be  remarked  that  the  data-based  algorithms  for  choosing  h  in  the  complete 
sample  case  discussed  by  Scott  and  Factor  (1981)  do  not  seem  to  be  fruitful  for 
the  case  of  censored  samples.  In  particular,  an  expression  similar  to  their 
(2.4)  (see  also  Parzen,  1962),  and  hence  (2.10),  is  not  available  in  the 
censored  data  case  and  seems  to  be  extremely  difficult  to  obtain  (McNichols 
and  Padgett,  1981).  A  likelihood  approach  corresponding  to  their  expression 
(2.8)  does  not  seem  to  be  feasible  either,  since  for  censored  data,  the  survival 
function  corresponding  to  fQ  appears  in  the  likelihood  function.  Hence,  in 
the  simulation  study  described  in  the  next  section,  some  attention  is  given  to 
estimating  the  mean  squared  errors  of  the  kernel  estimators  as  a  function  oi 
various  bandwidth  values.  This  gives  an  indication  of  the  range  of  value:; 
of  h  which  tend  to  minimize  mean  squared  errors  of  both  (3.1)  and  (3.3)  in 
the  cases  simulated. 

4.  THE  MONTE  CARLO  SIMULATIONS 

Simulations  were  performed  for  randomly  right-censored  samples  generated 
from  five  different  families  of  life  distributions  commonly  used  in  the 


literature:  exponential  with  mean  p,  denoted  E(P),  gamma  with  parameters 
a  and  p,  denoted  G(a,P),  Weibull  with  density 


cr,xNa-l 


f(x;  a,  P)  =  |(|)u  A  expI-Cx/P)01]  ,  x  >  0,  denoted  W(a,P),  lognormal 

with  mean  exp(a  +  HP  )  >  denoted  L (of > P ) »  and  inverse  Gaussian  with  density 

f(x;  p,  A)  =  [A/(2nx^)]^  exp[ -A(x-p)^/ (2p^x) ] ,  x  >  0,  denoted  IG(p,A). 

Two  different  types  of  censoring  distributions  were  utilized,  exponential 
with  mean  one  and  uniform  on  (0,t^),  where  t  denotes  the  q*"*1  percentile 
of  the  life  distribution,  0  <  q  <  100.  Three  different  kernel  functions  K 
were  used,  the  standard  normal  density,  the  uniform  density  on  [-1,1],  and 
the  triangular  density  on  [-1,1], 


K(x)  = 


-  f 


1  -  x 


I  0, 


ixl  £  1 


otherwise. 


In  addition,  several  bandwidth  values  h  =  h(n)  were  used  in  the  study, 
including  h(n)  =  n  p  for  various  values  of  p. 

The  simulations  represented  in  Tables  4. 1-4. 7  and  4.12-4.15  were  based  on 
1,000  randomly  right-censored  samples  each  of  size  n  for  each  choice  of  life 
distribution,  censoring  distribution,  kernel  function,  and  bandwidth  value  for 
n  =  20,  50,  100,  and  300.  For  each  sample,  the  estimates  (3.1)  and  (3.3)  were 
computed  for  values  of  t  =  5th,  10th,  20th, ...  ,90th,  and  95th  percentiles  of 
the  censoring  distribution  (t  values  of  5th,  50th,  and  95th  percentiles  only  are 
reported  in  these  tables).  At  each  t,  the  bias,  mean  squared  error  (MSE),  and 
variance  of  the  estimators  were  estimated  from  the  1,000  computed  values.  Also, 

the  standard  error  of  the  estimate  of  MSE  at  each  t  was  computed  for  each 

-2 

estimator.  The  standard  errors  were  bounded  by  10 


The  computer  programs  for  the  simulations  were  written  in  Fortran  on 
an  Amdahl  470  V611  computer.  The  random  number  generators  contained  in  the 


International  Mathematical  and  Statistical  Libraries  (1980) (IMSL)  package  were 
used  in  the  generation  of  the  required  samples.  Uniform  random  numbers  were 
generated  from  the  IMSL  subroutine  GGUBS.  IMSL  subroutine  GGEXN  was  used 
for  the  exponential  random  numbers,  GGAMR  for  gamma,  GGWIB  for  Weibull,  and 
GGNLG  for  lognormal  random  numbers.  To  generate  a  value  x  from  the  inverse 
Gaussian  distribution,  the  procedure  given  by  Michael,  Schucany,  and  Haas  (1976) 
was  used. 

The  Monte  Carlo  simulations  were  performed  in  the  following  manner: 

A  random  sample  X°,...,X°  was  generated  from  the  life  distribution,  and  a 
random  sample  U^,...,U  was  generated  from  the  censoring  distribution.  Next, 
the  randomly  right-censored  sample  (X^,Ap,  i=l,...,n,  was  obtained  by 
X.  =  min{X°,U. } ,  A.  =  1  if  X.  =  X°  and  A.  =  0  if  X.  =  U..  The 
values  Xp  . . .  ,X  were  ordered  to  yield  (Z^,Ap,  i=l,...,n,  and  the 

product-limit  estimator  was  computed  along  with  the  jump  size  s^  at 
each  The  estimators  f*(t)  and  fn(t)  given  by  (3.1)  and  (3.3)  were 

computed  at  the  appropriate  values  of  t.  This  entire  procedure  was  repeated 
for  1,000  randomly  right-censored  samples.  The  average  biases,  mean  squared 

-2 

errors,  and  variances  as  well  as  the  standard  errors  (all  were  bounded  by  10  ) 

A 

of  the  estimated  mean  squared  errors  were  computed  for  f*(t)  an(*  fD(0  over 
the  1,000  samples.  The  entire  procedure  was  repeated  for  each  sample  si/e,  life 
distribution,  censoring  distribution,  kernel  function,  and  bandwidth  value 
mentioned  before. 

A 

Some  representative  simulation  results  for  fQ(t)  are  given  i'1  Tables  4 .  1 

-  4.7.  All  of  the  results  cannot  be  listed  due  to  space  limitations.  All  eutri 

-4 

in  all  tables  are  to  be  multiplied  by  10 

In  the  hope  of  gaining  some  insight  into  the  behavior  of  f*  and  with 

respect  to  the  bandwidth  values  h,  several  cases  were  simulated,  using  200 


samples  each  (instead  of  1000,  due  to  computer  time  constraints),  ir  which 
the  estimated  MSE  was  obtained  as  a  function  of  h.  The  estimated  MSE  was 


obtained  for  f*  and  f  at  values  h  =  .05, ( .05) , . 55 .  For  sample  size 
n=50  and  100  some  representative  results  are  shown  in  Tables  4.8-4.11.  Note 
that  the  range  of  h  values  contains  n  n  and  n  within  the 

a 

boundaries  for  these  sample  sizes.  The  results  indicate  that  f*  and  f  tend 

n  n 

to  behave  similarly  with  respect  to  MSE.  Therefore,  in  order  to  indicate  a 

a 

comparison  of  the  behavior  of  f*  and  f  as  density  estimators  when  the 

n  n 

same  bandwidth  values  are  used,  some  representative  simulation  result:;  are 
listed  in  Tables  4.12-4.15.  In  these  tables  a  =  P(an  uncensored  observation) 

=  P(X°  SU.). 

l  l 


5.  CONCLUSIONS 


Several  conclusions  concerning  the  small-sample  behavior  of  the  kernel 
A  * 

density  estimators  f  and  f  can  be  stated  based  on  the  extensive  Simula- 
3  n  n 

tions  described  in  Section  4.  In  particular,  the  simulation  results  indicate 

the  following  for  f  :  The  estimated  variances  of  f  (t)  increase  as  the 

u  n 

bandwidth  sequence  (h(n)}  varies  from  n  to  n  For  h(n)  =  n 

the  variances  of  fQ(t)  decrease  as  n  increases,  but  probably  do  not 
converge  to  zero  uniformly  in  t.  For  small  values  of  t,  the  bias  of 

a 

fQ(t)  is  larger  in  magnitude  than  the  biases  for  moderate  to  large  values 

of  t.  Overall,  with  respect  to  the  criterion  of  mean  squared  error,  lor  both 

A  -1/5 

f*  and  f  with  moderate  to  large  t,  h(n)  =  n  appears  to  be  the  best 

n  n 

choice  for  the  bandwidth  among  the  values  h(n)  =  n  p  =  1/2,  1/3,  1/5, 

-1/2  -1/3 

whereas  for  small  t,  n  or  n  appears  better  with  respect  to  mean 

squared  error  (Tables  4. 1-4.7).  This  is  supported  by  the  representative 
results  in  Tables  4.8-4.11.  Of  the  three  kernel  functions  studied,  the 
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Table  4.8.  Estimated  MSE  of  Kernel  Density  Estimators 

Life  Distribution:  E(l),  Censoring  Distribution:  U(0,t  ) 

Kernel:  N<0,1>  -90 

(All  entries  are  to  be  multiplied  by  1.0E-04.) 

(a)  n=50 


h 

\ 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

.55 

.10 

a. 

830 

390 

240 

310 

420 

610 

760 

980 

1130 

1280 

1450 

(  .23) 

b. 

820 

390 

250 

330 

470 

700 

900 

1160 

1350 

1530 

1760 

.25 

a. 

820 

390 

190 

110 

100 

50 

50 

60 

70 

100 

1400 

(  .58) 

b. 

810 

390 

190 

110 

110 

60 

60 

60 

60 

90 

1410 

.50 

a. 

600 

330 

200 

140 

100 

100 

60 

50 

50 

50 

40 

(1.15) 

b. 

540 

310 

190 

140 

110 

110 

80 

80 

100 

110 

130 

.75 

a. 

1590 

700 

290 

230 

170 

130 

90 

100 

90 

70 

80 

(1.73) 

b. 

720 

290 

180 

130 

110 

130 

90 

90 

140 

230 

240 

.90 

a. 

2670 

1110 

750 

440 

290 

180 

150 

110 

80 

70 

70 

(2.07) 

b. 

290 

200 

150 

120 

100 

60 

50 

50 

120 

100 

160 

(b)  n=100 


.10 

a. 

500 

190 

120 

200 

340 

550 

740 

960 

1130 

1300 

1450 

( .23) 

b. 

510 

200 

120 

220 

400 

640 

880 

1150 

1350 

1560 

1750 

.25 

a. 

380 

140 

90 

60 

50 

30 

30 

40 

60 

90 

130 

(  .58) 

b. 

380 

140 

90 

60 

60 

30 

30 

30 

50 

80 

130 

.50 

a. 

350 

160 

90 

60 

50 

40 

30 

30 

30 

30 

30 

(1.15) 

b. 

340 

160 

90 

70 

60 

50 

50 

70 

100 

110 

110 

.75 

a. 

530 

220 

120 

110 

90 

70 

70 

70 

60 

60 

60 

(1.73) 

b. 

340 

140 

90 

80 

80 

70 

100 

80 

90 

150 

250 

.90 

a. 

2130 

950 

390 

310 

200 

150 

110 

90 

70 

60 

50 

(2.07) 

b. 

420 

150 

110 

80 

80 

70 

110 

140 

240 

260 

380 

a.  MSE  f  , 


b.  MSE  t* 


a 


u 


Table  4.9.  Estimated  MSE  of  Kernel  Density  Estimators 

Life  Distribution:  H(2,l),  Censoring  Distribution:  U(0,t 

Kernel:  N<0,1) 

(All  entries  are  to  be  multiplied  by  1.0E-04.) 

(a)  n=50 


r\ 

<t 

p 

h 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

55 

.10 

a. 

390 

150 

100 

80 

80 

60 

70 

70 

60 

60 

60 

(  .15) 

b. 

380 

140 

100 

70 

50 

50 

40 

40 

40 

40 

50 

.25 

a. 

850 

460 

230 

180 

170 

210 

210 

240 

280 

320 

370 

(  .38) 

b. 

850 

460 

260 

250 

290 

430 

480 

600 

740 

870 

980 

.50 

a. 

1880 

810 

400 

240 

240 

220 

290 

350 

460 

600 

770 

( .76) 

b. 

1770 

830 

400 

290 

290 

350 

450 

580 

780 

920 

1130 

.75 

a. 

3020 

1230 

580 

290 

170 

120 

50 

40 

30 

60 

100 

(1.14) 

b. 

2150 

770 

680 

440 

350 

750 

640 

620 

470 

850 

480 

.90 

a. 

6280 

1860 

910 

450 

280 

170 

110 

80 

50 

30 

70 

(1.37) 

b. 

1890 

910 

590 

530 

650 

90 

1250 

1920 

2550 

3380 

3  36  0 

(b) 

n=100 

.10 

a. 

180 

80 

50 

50 

40 

40 

40 

40 

40 

50 

40 

(  .15) 

b. 

180 

80 

50 

50 

30 

30 

20 

30 

30 

30 

40 

.25 

a. 

510 

230 

130 

100 

120 

150 

180 

230 

270 

330 

380 

(  .38) 

b. 

500 

240 

140 

160 

250 

350 

460 

590 

720 

850 

980 

.50 

a. 

870 

340 

170 

150 

140 

170 

250 

340 

470 

620 

79(i 

( .76) 

b. 

840 

330 

180 

170 

170 

230 

360 

520 

700 

900 

)  Of U) 

.75 

a. 

1100 

620 

340 

160 

110 

60 

30 

20 

30 

50 

90 

(1.14) 

b. 

1020 

510 

400 

260 

300 

400 

390 

420 

420 

350 

330 

.90 

a. 

3640 

1410 

520 

320 

190 

110 

80 

50 

30 

20 

10 

(1.37) 

b. 

1350 

610 

640 

840 

1450 

2760  ; 

3540  < 

4720 

7980  ! 

5760 

7700 

a.  MSE 

a 

fn' 

b. 

MSE  f* 

WW 


Table  4.10.  Estimated  MSE  of  Kernel  Density  Estimators 


Life  Distribution:  W(.5,l),  Censoring  Distribution:  U(0,t_75) 

Kernel:  N<0,1) 

(All  entries  are  to  be  multiplied  by  1.0E-04.) 

(a)  n=50 


p\ 

h 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

.55 

.10 

a. 

880 

550 

410 

170 

100 

160 

300 

470 

660 

800 

990 

(  .19) 

b. 

900 

640 

500 

200 

120 

160 

290 

490 

690 

860 

1080 

.25 

a. 

660 

210 

160 

160 

170 

170 

160 

120 

90 

80 

50 

(  .48) 

b. 

640 

210 

170 

200 

260 

320 

300 

260 

180 

160 

100 

.50 

a. 

440 

200 

150 

130 

90 

60 

70 

110 

120 

170 

190 

(  .96) 

b. 

400 

190 

140 

130 

90 

70 

110 

140 

190 

280 

330 

.75 

a. 

1540 

530 

520 

430 

390 

360 

330 

330 

310 

270 

270 

(1.44) 

b. 

600 

220 

160 

130 

110 

70 

60 

100 

130 

240 

300 

.90 

a. 

7440 

5040 

2700 

1930 

1190 

910 

700 

540 

450 

330 

300 

(1.73) 

b. 

470 

270 

180 

130 

130 

90 

100 

150 

200 

310 

330 

(b)  n=100 


.10 

a. 

420 

370 

260 

120 

60 

120 

270 

440 

640 

820 

980 

(  .19) 

b. 

420 

450 

340 

160 

60 

110 

270 

450 

660 

890 

1070 

.25 

a. 

200 

130 

80 

90 

130 

150 

140 

100 

70 

50 

40 

(  .48) 

b. 

200 

130 

90 

130 

220 

290 

270 

230 

180 

130 

80 

.50 

a. 

210 

100 

50 

40 

40 

40 

50 

70 

90 

130 

160 

(  .96) 

b. 

200 

100 

50 

50 

50 

60 

90 

140 

180 

250 

320 

.75 

a. 

750 

190 

150 

140 

190 

200 

250 

240 

240 

250 

240 

(1.44) 

b. 

360 

140 

70 

70 

40 

50 

60 

70 

80 

160 

260 

.90 

a. 

4320 

2410 

2090 

1530 

1170 

820 

650 

490 

410 

340 

280 

(1.73) 

b. 

330 

170 

170 

100 

130 

110 

170 

180 

360 

470 

650 

.  10 

a. 

1390 

650 

400 

900 

1670 

2370 

3090 

3540 

4040 

4580 

4930 

( .  11) 

b. 

1520 

780 

390 

920 

1730 

2560 

3300 

3850 

4410 

5000 

5470 

.25 

a. 

820 

380 

400 

350 

200 

110 

60 

50 

70 

130 

180 

(  .29) 

b. 

820 

420 

590 

590 

410 

220 

140 

70 

60 

90 

150 

.50 

a. 

550 

190 

150 

130 

110 

120 

90 

130 

140 

120 

140 

( .69) 

b. 

510 

180 

150 

150 

150 

310 

290 

450 

590 

550 

660 

.75 

a. 

680 

520 

240 

210 

130 

120 

110 

110 

90 

90 

100 

(1.39) 

b. 

530 

280 

150 

120 

80 

60 

90 

90 

110 

140 

270 

.90 

a. 

1950 

650 

510 

370 

350 

200 

180 

150 

120 

120 

100 

(2.30) 

b. 

420 

150 

90 

70 

80 

50 

50 

30 

40 

40 

30 

<b)  n=100 


.  10 

a. 

980 

360 

270 

830 

1670 

2350 

2910 

3630 

4040 

4480 

4870 

(  .  11' 

b. 

1090 

440 

260 

830 

1730 

2520 

3120 

3960 

4400 

4900 

5330 

.25 

a . 

310 

260 

260 

230 

140 

80 

30 

30 

50 

100 

150 

(  .  29) 

b. 

310 

310 

430 

470 

330 

190 

100 

40 

30 

60 

110 

.50 

a. 

310 

120 

90 

50 

60 

80 

80 

110 

110 

100 

90 

(  .69) 

b. 

300 

120 

90 

60 

130 

230 

310 

470 

490 

510 

520 

.75 

a . 

380 

270 

100 

70 

60 

50 

40 

50 

40 

40 

40 

(1.39) 

b. 

320 

160 

80 

60 

50 

50 

50 

60 

90 

150 

220 

.90 

a. 

670 

420 

290 

230 

140 

120 

100 

90 

60 

50 

50 

(2.30) 

b. 

200 

100 

80 

60 

50 

40 

40 

30 

30 

20 

20 

a.  MSE  f  ,  b.  MSE  f* 


K  -  Standard  Normal 


h(n)  =  n 


Life  Distribution:  E(5)  Censoring  Distribution:  E(l) 

-4  1 

(All  entries  to  be  multiplied  by  10  .)  a  =  — 

o 


:  io 

242 

230 

236 

-  778 

35 

96  | 

1  20 

770 

314 

373 

-  714 

44 

95  ' 

(34700) 

50 

1435 

406 

612 

-  480 

81 

104 

mm 

1259 

410 

568 

-  339 

86 

98  j 

Bl 

-  474 

6 

28 

|  -  497 

1 

25  1 

500 

mm 

-  464 

9 

30 

,  -  498 

0 

25 

,  (69300) 

mm 

-  385 

48 

63 

:  -  496 

1 

25  j 

100 

-  373 

41 

55 

-  489 

2 

26  1 

24 


~  * 

Table  4.13.  Comparison  of  and  for  Small  Samples 


K  -  Standard  Normal  h(n)  =  n 

Life  Distribution:  E(l)  Censoring  Distribution:  E(10) 

-4  10 

(All  entries  to  be  multiplied  by  10  .)  a  =  — 


n 

I 

1 

j 

I 

! 

i 

Bias 

MSE 

Bias 

mm 

— 

6127 

65 

3819 

-  6242 

65 

3961 

MB 

- 

5913 

41 

3538 

-  6022 

41 

3668 

HQ 

- 

5671 

24 

3224 

-  5771 

24 

3354 

NKH 

m 

- 

5463 

17 

3001 

-  5554 

17 

3101 

10 

— 

3568 

66 

1339 

-  3654 

68 

1402 

20 

- 

3222 

42 

1080 

-  3297 

44 

1131 

50 

- 

2779 

24 

797 

-  2832 

25 

827 

- 

2419 

18 

602 

-  2458 

18 

622 

— 

865 

47 

122 

-  873 

52 

128 

20 

- 

562 

32 

63 

-  541 

36 

65 

(6931) 

50 

- 

241 

19 

25 

-  205 

19 

24 

100 

- 

73 

13 

14 

-  35 

14 

14 

10 

383 

46 

61 

408 

48 

64 

20 

359 

32 

45 

404 

34 

51 

(1390) 

50 

274 

18 

25 

325 

19 

29 

- 

100 

203 

11 

16 

240 

12 

17 

107 

26 

27 

56 

22 

22 

20 

60 

15 

15 

29 

14 

14 

(29957) 

50 

59 

8 

8 

49 

8 

8 

100 

37 

4 

4 

36 

4 

5 

Table  4.14.  Comparison  of  f  and  f  for  Small  Samples 
_  n  n  r 


K  -  Uniform  [-1,1]  h(n)  *  n  ' 

Life  Distribution:  E(5)  Censoring  Distribution:  E(l) 


(All  entries  to  be  multiplied  by  10  .) 


1 

3  *6 


f°(t) 


1900  50 

(2600)  100 


1500 

(14400) 


1000 

(34700) 


500 

(69300) 


10 

50  j 
100  ' 
300  ! 


Bias 

A 

fn(t> 

Variance 

MSE 

322 

176 

186 

480 

42 

65 

381 

25 

39 

186 

11 

14 

1773 

840 

1153 

16 

148 

148 

4 

72 

72 

10 

32 

32 

142 

526 

528 

1543 

1060 

1297 

1320 

976 

1149 

221 

324 

329 

479 

9 

32 

376 

94 

108 

398 

69 

84 

241 

179 

185 

* 

f„(t) 


Bias 

Variance 

622 

86 

1  24 

582 

34 

68 

450 

21 

42 

208 

10 

14 

103 

289 

290 

53 

122 

122 

1 

70 

70 

4 

32 

)2 

808 

65 

130 

1044 

144 

173 

415 

159 

176 

123 

145 

147 

495 

1 

26 

500 

0 

25 

486 

5 

29 

466 

13 

35 

JfvTLvCvfC  v 


V.v --.v.-.-.v  - 


standard  normal  density  seems  to  be  the  best  choice.  Other  kerne]  ft  t- 


which  closely  fit  the  standard  normal  (Parzen,  1962)  may  perform  a:;  e ..  out 

A 

were  not  included  in  this  study.  The  estimator  f  is  fairly  rot-u-  .h 

A 

respect  to  the  life  distribution,  and  f  performs  well  near  the  "center" 
of  the  life  distribution  regardless  of  whether  the  censoring  distribution  is 
exponential  or  uniform. 

From  the  results  represented  by  Tables  4.8-4.11,  it  is  evident  l lit i  for 

each  x,  the  estimated  MSE  appears  to  have  at  least  a  relative  minimum  ,.t 

-1/2  -1/3 

some  value  of  h.  These  seem  to  occur  near  the  values  n  ,  n  >  >, 

n  *  ,  although  it  seems  to  be  difficult  to  prove  this  result  analytically, 

as  mentioned  before.  Based  on  these  results  and  the  results  represent r'  !v 

Tables  4. 1-4. 7  that  the  estimated  variances  of  f  increase  as  h  ra.u 

n 

from  n  to  n  the  value  h  =  n  or  n  seems  to  be  a 

reasonable  choice  for  the  bandwidth  in  practice. 

The  above  conclusions  indicate  that  the  bias,  variance,  and  mean  mji-h 

A 

error  of  fQ(t)  decrease  as  n  becomes  larger,  regardless  of  the  life 

distribution  or  censoring  distribution.  This  leads  to  the  conjecture  that 

the  asymptotic  results  of  McNichols  and  Padgett  (1981)  hold  without  the 

condition  of  the  Koziol-Green  (or  proportional  hazards)  model  of  random 

censorship.  Most  of  the  cases  simulated  do  not  satisfy  the  condition  of  thai 

model.  An  analytical  proof  of  this  conjecture,  however,  would  be  quite  cl i f  i  i <  » ■  M  . 

The  simulation  results  represented  by  Tables  4.8-4.11  indicate  that,  wit  l  > 

respect  to  estimated  mean  squared  error,  f*  and  fQ  behave  similarly  as: 

the  bandwidth  values  vary.  The  simulations  (Tables  4.12-4.15)  also  indicate 

*  * 

that  the  Blum-Susarla  estimator  f  and  the  estimator  f  perform  about  the 

n  n 

same  with  respect  to  bias,  variance,  and  mean  squared  error  when  a  =  P(utirci)M>rrcl 


observation)  is  larger  than  0.5.  When  a  <  1/2,  f  tends  to  have  smaller 


variance  and  mean  squared  error  than  f  .  This  can  probably  be  explained 

•k 

by  noting  that  f  depends  upon  having  a  good  estimate  of  the  censoring 
survival  function  1-H  in  the  denominator,  and  when  there  is  a  large  portion 
of  the  observations  which  are  censored,  the  denominator  Hq  of  (3.1)  would 
give  a  good  estimate  of  1-H.  Hence,  when  a  is  small,  f  would  generally 

A 

provide  a  better  density  estimate  than  f  with  respect  to  smaller  variance 
and  mean  squared  error. 
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